# RISC-V Processor Design

Daniel Chen Berkeley EECS 251 LA

# **5-Stage Pipeline**: IF/ ID/ EX/ MEM/ WB



#### **Control Unit**

- Input:
  - Opcode
  - funct3 (1-bit)
    - CSRRW: **0**01
    - CSRRWI: **1**01
- Output:
  - Asel, Bsel
    - PC or data from register
  - if\_rs1, if\_rs2
    - Forwarding unit

```
`OPC ARI RTYPE:
begin
   if_rs1 = 1'd1;
   if_rs2 = 1'd1;
   btype = 1'd0;
   jtype = 1'd0;
   stype = 1'd0;
   csrtype = 1'd0;
   RegWEn = 1'd1; //write to reg
   Asel = 1'd0; //not deal with hazard
   Bsel = 1'd0; //not deal with hazard
   memRW = 1'd0; //not write
   memtoreg = 2'd1; //choose alu_out
   //don't care
   dataW_sel = 2'd0;
end
```

#### Forwarding Unit

- Generate A\_sel/ B\_sel
- For B-type:

Using ALU to calculate PC\_nxt



#### Branch Predictor (ID-stage)

| # of cycles | w/o BrPred | 2-bit | 1-bit | Always Taken |
|-------------|------------|-------|-------|--------------|
| cachetest   | 90410      | 90409 | 90410 | 88368        |
| final       | 6415       | 6408  | 6415  | 6264         |
| fib         | 5558       | 5558  | 5558  | 5412         |
| sum         | 76102      | 76037 | 76102 | 73961        |
| replace     | 76176      | 76109 | 76176 | 74038        |

Tested on test\_bmark\_short

#### Cache (Write Back + Allocate on miss)

#### **Direct-Mapped**

- Data: 4 x SRAM1RW 256x128 (32 entries used)
- Dirty/Valid/Tag:
  - Implemented with regs



#### Cache (Write Back + Allocate on miss)

#### 2-Way Associative

- Data: 8x **SRAM1RW 256x128 (32 entries used)**
- Replacement:
  - Least Recently Used policy
- LRU/Dirty/Valid/Tag:
  - Implemented with regs



#### **Read Only** Cache

- For I\$
- INIT state:
  - SRAM didn't initialize



#### Cache with **Write Buffer** State != Read Mem or ~mem reg rdy or ~mem\_resp\_ready\_but\_dirty Memory Interface: Write State == Read Mem and mem\_req\_rdy and mem resp\_ready\_but\_dirty if (state == Before read): Before Write IDLE Update write Buffer mem req rdy mem req rdy mem reg data rdy and wb cnt != 3'd3 mem reg rdy and mem resp valid Cache Interface: if(hit): write cpu index, dirty ~miss Cache Interface: Write in Cache Write mem resp data **CPU Interface:** Stall ~mem\_req\_data\_rdy Write Back Before Idle mem\_req\_data\_rdy Cache Interface: Read Update valid, tag, dirty. cpu req write and wb\_cnt == 3'd3 **CPU Interface: Stall** if(miss): follow cpu index MEM Interface: Read && cpu rdy else follow cpu reg addr Memory Interface: Write Write **IDLE CPU Interface:** ~mem resp valid if(miss): Stall hit wb state else Dont Stall miss ~hit mem reg rdy and mem resp valid Cache Interface: Write in Cache Write mem resp data Before Read mem\_req\_rdy Read Mem ~mem reg rdy Update valid, tag, dirty... wb state == IDLE **CPU Interface:** Stall MEM Interface: Read Memory Interface: Read from mem CPU Interface: Stall ~mem resp valid **Read and Write**

#### Functionality (test\_asm/ test\_bmark)

```
/home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm output/addi.out
                                                                                                       () after 381 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-aby/scratch/eecs151-aby/project/asm_output/add.out
                                                                                                         after 689 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-aby/scratch/eecs151-aby/project/asm_output/andi.out
                                                                                                       () after 321 simulation cycles
                                                                                                       () after 707 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/and.out
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/auipc.out
                                                                                                       () after 130 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/beg.out
                                                                                                         after 496 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/bge.out
                                                                                                          after 535 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/bgeu.out
                                                                                                         after 550 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/blt.out
                                                                                                         after 495 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-aby/scratch/eecs151-aby/project/asm_output/bltu.out
                                                                                                         after 508 simulation cycles
PASSED
                                                                                                         after 492 simulation cycles
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm output/bne.out
PASSED
         /home/cc/eecs151/fl21/class/eecs151-aby/scratch/eecs151-aby/project/asm_output/ial.out
                                                                                                          after 127 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-aby/scratch/eecs151-aby/project/asm_output/jalr.out
                                                                                                         after 227 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/lb.out
                                                                                                         after 386 simulation cycles
PASSED
                                                                                                         after 386 simulation cycles
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/lbu.out
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/lh.out
                                                                                                         after 414 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm output/lhu.out
                                                                                                          after 415 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-aby/scratch/eecs151-aby/project/asm_output/lui.out
                                                                                                         after 139 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/lw.out
                                                                                                         after 410 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm output/ori.out
                                                                                                         after 331 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-aby/scratch/eecs151-aby/project/asm output/or.out
                                                                                                         after 704 simulation cycles
                                                                                                         after 651 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/sb.out
PASSED
                                                                                                         after 732 simulation cycles
         /home/cc/eecs151/fl21/class/eecs151-aby/scratch/eecs151-aby/project/asm_output/sh.out
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm output/simple.out
                                                                                                         after 100 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-aby/scratch/eecs151-aby/project/asm_output/slli.out
                                                                                                         after 387 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/sll.out
                                                                                                         after 734 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/slti.out
                                                                                                         after 381 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/sltiu.out
                                                                                                         after 379 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/slt.out
                                                                                                       () after 683 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/sltu.out
                                                                                                         after 683 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/srai.out
                                                                                                         after 402 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/sra.out
                                                                                                         after 755 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-aby/scratch/eecs151-aby/project/asm_output/srli.out
                                                                                                         after 396 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm output/srl.out
                                                                                                       () after 744 simulation cycles
PASSED
         /home/cc/eecs151/fl21/class/eecs151-aby/scratch/eecs151-aby/project/asm_output/sub.out
                                                                                                         after 678 simulation cycles
PASSED
                                                                                                       () after 730 simulation cycles
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/sw.out
PASSED
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/xori.out
                                                                                                       () after 333 simulation cycles
PASSED 1
                                                                                                       () after 707 simulation cycles
         /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/asm_output/xor.out
        /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/bmark output/cachetest.out
                                                                                                    () after 3016974 simulation cycles
        /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/bmark_output/final.out
                                                                                                   () after 6264 simulation cycles
        /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/bmark_output/fib.out () after 5412 simulation cycles
        /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/bmark_output/sum.out () after 19091565 simulation cycles
       /home/cc/eecs151/fl21/class/eecs151-abv/scratch/eecs151-abv/project/bmark_output/replace.out
                                                                                                   () after 19559246 simulation cycles
```

# Runtime Analysis

#### (2-way associative + always taken)

| Test Cases | # of cycles | Clock period (ns) | Runtime (s)             |
|------------|-------------|-------------------|-------------------------|
| cachetest  | 3016974     | 1.3               | 3.922* 10 <sup>-3</sup> |
| final      | 6264        | 1.3               | 8.143* 10 <sup>-6</sup> |
| fib        | 5412        | 1.3               | 7.036* 10 <sup>-6</sup> |
| sum        | 19091565    | 1.3               | 2.482* 10 <sup>-2</sup> |
| replace    | 19559246    | 1.3               | 2.542* 10 <sup>-2</sup> |

#### Floorplan

- Floorplan dimension:
  - Width x Height =  $700 \times 800$
  - Area =  $560,000 \text{ um}^2$
- Area utilization (density):
  - Riscv\_top: 197711 um^2
  - 197711 / 560000 = 35%



# Power (Innovus Estimated)

- Static power estimation (activity factor = 0.2)

| Total Power            |             |          |
|------------------------|-------------|----------|
| Total Internal Power:  | 4.78074122  | 26.0319% |
| Total Switching Power: | 11.55955628 | 62.9437% |
| Total Leakage Power:   | 2.02461838  | 11.0244% |
| Total Power:           | 18.36491586 |          |
|                        |             |          |

| Group                 | Internal<br>Power | Switching<br>Power | Leakage<br>Power | Total<br>Power | Percentage<br>(%) |
|-----------------------|-------------------|--------------------|------------------|----------------|-------------------|
| Sequential            | 2.594             | 0.1136             | 0.5464           | 3.254          | 17.72             |
| Macro                 | 0                 | 0.1515             | 0                | 0.1515         | 0.8247            |
| 10                    | 0                 | 0                  | 5.536e-07        | 5.536e-07      | 3.014e-06         |
| Combinational         | 1.94              | 10.69              | 1.448            | 14.08          | 76.68             |
| Clock (Combinational) | 0.09363           | 0.3244             | 0.0001642        | 0.4182         | 2.277             |
| Clock (Sequential)    | 0.1527            | 0.2757             | 0.02959          | 0.458          | 2.494             |
| Total                 | 4.781             | 11.56              | 2.025            | 18.36          | 100               |

# **Clock Tree Routing**

- Insertion Delay quite high
  - Try different orientations/ locations





#### Histogram

Not really ideal



#### Critical Path

- Light Blue line
- Start Point: dcache/ from\_mem\_reg
- End Point: alu\_out\_MEM\_reg



#### Critical Path (post-syn)

- Clock Period = 1.3 ns
- Slack = 72 ps
- Startpoint:
  - dcache/cpu\_offset\_reg
- Endpoint:
  - pc\_add4\_reg

```
Path 1: MET (72 ps) Setup Check with Pin cpu/pc add4 reg[30]/CLK->D
           View: PVT 0P63V 100C.setup view
          Group: clk
     Startpoint: (R) mem/genblk1.dcache/cpu offset reg[1]/CLK
          Clock: (R) clk
       Endpoint: (R) cpu/pc add4 reg[30]/D
          Clock: (R) clk
                     Capture
                                   Launch
                        1300
        Clock Edge:+
                                         0
       Src Latency:+
                                         0
       Net Latency:+
                           0 (I)
           Arrival:=
                        1300
             Setup: -
                         100
       Uncertainty:-
     Required Time:=
                        1193
      Launch Clock: -
         Data Path: -
                        1121
             Slack:=
                          72
```

#### Critical Path (post-cts)

- Clock Period = 1.3 ns
- Slack = 6.612 ps
- Startpoint
  - dcache/ from\_mem\_reg
- Endpoint
  - alu\_out\_MEM\_reg

```
Path 1: MET (6.162 ps) Setup Check with Pin cpu/alu out MEM reg[0]/CLK->D
               View: PVT 0P63V 100C.setup view
              Group: reg2reg
         Startpoint: (R) mem/genblk1.dcache/from mem reg/CLK
              Clock: (R) clk
           Endpoint: (R) cpu/alu out MEM reg[0]/D
              Clock: (R) clk
                        Capture
                                       Launch
         Clock Edge:+
                       1300.000
                                        0.000
        Src Latency:+ -250.966
                                     -250.966
        Net Latency:+ 228.900 (P)
                                      272.800 (P)
            Arrival:= 1277.934
                                       21.834
              Setup: -
                          6.339
        Uncertainty:-
                        100.000
        Cppr Adjust:+
                          7.700
      Required Time:= 1179.295
       Launch Clock:= 21.834
          Data Path:+ 1151.299
              Slack:=
                          6.162
     Timing Path:
```

Both post\_syn/ post\_par are from READ\_MEM state in dcache to regs in RISCV core. -> wait for mem\_resp\_valid & mem\_req\_ready to generate the STALL signal.

# Power Analysis (Switching Power)

| Addi.hex                   | Setup view | Hold view |
|----------------------------|------------|-----------|
| Static Power (alpha = 0.2) | 11.56      | 17.25     |
| Dynamic Power              | 1.39       | 2.29      |
| Activity Factor            | 2.4%       | 2.7%      |

| Final.hex                  | Setup view | Hold view |
|----------------------------|------------|-----------|
| Static Power (alpha = 0.2) | 11.56      | 17.25     |
| Dynamic Power              | 1.18       | 1.94      |
| Activity Factor            | 2%         | 2.2%      |

#### Summary

- Optimization for frequency
  - 5-stage pipeline
- Optimization for cycles
  - Branch Predictor (2-bit/ 1-bit/ always taken)
  - Write Back Policy
  - Write Buffer in D\$ (Using two FSMs)
- Optimization for area
  - Read-only I\$